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1. Introduction and Notation 

We consider the filtering problem for a Markov chain {Xk,Yk}k>o with state X 
and observation Y. The state process {^fc}fe>o is an homogeneous Markov chain 
taking value in a measurable set X equipped with a tr-algebra B{X). We let Q be the 
transition kernel of the chain. The observations {yfe}A:>o takes values in a measurable 
set Y {BY is the associated cr-algebra). For i < j, denote Yi-j = {Yi, Fi+i, • • • , Yj). 
Similar notation will be used for other sequences. We assume furthermore, that for 
each A: > 1 and given Xk, Yk is independent of Xi:fc_i,Xfc+i:oo, Yi;k-i, and Yfc+i:oo- 
We also assume that for each x G X, the conditional law has a density g{x, •) with 
respect to some fixed cr- finite measure on the Borel cr-field B{y). 

We denote by 05,n[j/O:n] the distribution of the hidden state Xn conditionally on 

the observations yo-n "^^^ [voi • ■ • ? yn]i which is given by 

, r .... Jx,.+i£.{dxo)g{xo,yo)lTLiQi^i-i'dxi)g{xi,y,)lA{xn) 
Jx,.+i ^ (axo )g[xo,yo)[h^iQ{xt-i,dx,)g[x„y^) 

In practice the model is rarely known exactly and therefore suboptimal filters 
are computed by replacing the unknown transition kernel, likelihood function and 
initial distribution by approximations. 

The choice of these quantities plays a key role both when studying the conver- 
gence of sequential Monte Carlo methods or when analysing the asymptotic be- 
haviour of the maximum likelihood estimator (see e.g., (@) or Q) and the references 
therein). A key point when analyzing maximum likelihood estimator or the stabil- 
ity of the filter over infinite horizon is to ask whether [yo:n] and ipc ,n.[yo:n] are 
close (in some sense) for large values of n, and two different choices of the initial 
distribution ^ and 

The forgetting property of the initial condition of the optimal filter in nonlinear 
state space models has attracted many research efforts and it is impossible to give 
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credit to every contributors. The purpose of the short presentation of the existing 
results below is mainly to allow comparison of assumptions and results presented in 
this contributions with respect to those previously reported in the literature. The 



first result in this direction has been obtained by (jl3f ). who established Lp-type 
convergence of the optimal filter initialised with the wrong initial condition to the 
filter initialised with the true initial distribution; their proof does not provide rate 
of convergence. A new approach based on the Hilbert projective metric has later 
been introduced in to establish the exponential stability of the optimal filter 
with respect to its initial condition. However their results are based on stringent 
mixing conditions for the transition kernels; these conditions state that there exist 
positive constants e_ and e+ and a probability measure A on (X, S(X)) such that 
for /gB+(X), 

£-A(/) < /) < e+Kf) , for any x G X . (2) 

This condition implies in particular that the chain is uniformly geometrically er- 
godic. Similar results were obtained independently by (0) using the Dobrushin er- 
godicity coefficient (see (fiol ) for further refinements of this result). The mixing 
condition has later been weakened by (@), under the assumption that the kernel Q 
is positive recurrent and is dominated by some reference measure A: 

sup q{x,x') < oo and / essinf(;(a;, a:')7r(x)A((ia;) > , 
{x,x')exxx J 

where q{x, •) = '^'^j^''^ , essinf is the essential infimum with respect to A and TrdA is 
the stationary distribution of the chain Q . Although the upper bound is reasonable, 
the lower bound is restrictive in many applications and fails to be satisfied e.g., for 
the linear state space Gaussian model. 



In (jl2i ). the stability of the optimal filter is studied for a class of kernels referred 
to as pseudo-mixing. The definition of pseudo-mixing kernel is adapted to the case 
where the state space is X = W^, equipped with the Borel sigma-field B{X). A kernel 
Q on (X,S(X)) is pseudo-mixing if for any compact set C with a diameter d large 
enough, there exist positive constants e_(d) > and s+{d) > and a measure Ac 
(which may be chosen to be finite without loss of generality) such that 

e-{d)Xc{A) <Q{x,A) <e+{d)\c{A) , for any a; e C, A e S(X) (3) 

This condition implies that for any {x',x") & C x C, 

^^4^ < casM^^xqix\x)/q{x",x) < cssaup^^y^q{x' ,x)/q{x" ,x) < ^iy^ , 
£+(a) e_(a) 

where q{x, •) dQ{x, ■)/dXc, and esssup and essinf denote the essential supremum 
and infimum with respect to Ac. This condition is obviousl y rn ore general than 
but still it is not satisfied in the linear Gaussian case (see (|l2l Example 4.3)). 

Several attempts have been made to establish the stability conditions under the 
so-called small noise condition. The first result in this direction has been obtained by 
([l|) (in continuous time) who considered an ergodic diffusion process with constant 
diffusion coefficient and linear observations: when the variance of the observation 
noise is sufficiently small, ([l]) established that the filter is exponentially stable. Small 



noise conditions also appeared (in a discrete time setting) in (|J) and (|l4f ). These 
results do not allow to consider the linear Gaussian state space model with arbitrary 
noise variance. 

More recently, (0) prove that the nonlinear filter forgets its initial condition in 
mean over the observations for functions satisfying some integrability conditions. 
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The main result presented in this paper rehes on the martingale convergence the- 
orem rather than direct analysis of filtering equations. Unfortunately, this method 
of proof cannot provide any rate of convergence. 

It is tempting to assume that forgetting of the initial condition should be true in 
general, and that the lack of proofs for the general state-space case is only a matter 
of technicalities. The heuristic argument says that either 

• the observations Y^s are informative, and we learn about the hidden state X 
from the Ys around it, and forget the initial starting point. 

• the observations Ys are non-informative, and then the X chain is moving by 
itself, and by itself it forgets its initial condition, for example if it is positive 
recurrent. 

Since we expect that the forgetting of the initial condition is retained in these two 
extreme cases, it is probably so under any condition. However, this argument is 
false, as is shown by the following examples where the conditional chain does not 
forget its initial condition whereas the unconditional chain docs. On the other hand, 
it can be that observed process, {yA;}fe>o is not ergodic, while the conditional chain 
uniformly forgets the initial condition. 

Example 1. Suppose that {Xk}k>o ife i.i.d. 5(1,1/2). Suppose Yi = = 

Then P (X, = 1 1 Xo - 0, Fo^O = 1 - P = 1 1 ^1 = 1, >0:n) e {0, 1}. 

Here is a slightly less extreme example. Consider a Markov chain on the unit 
circle. All values below are considered modulus 2-k. We assume that Xi — Xi^i + Ui, 
where the state noise {[/fc}fc>o O'^e i.i.d. . The chain is hidden by additive white noise: 
Yi = Xi + Ei, Ei = TrWi + Vi, where Wi is Bernoulli random variable independent 
of Vi. Suppose now that Ui and Vi are symmetric and supported on some small 
interval. The hidden chain does not forget its initial distribution under this model. 
In fact the support of the distribution of Xi given Y^-n and Xq = xq is disjoint from 
the support of its distribution given Yoin o,nd Xq = xq -\- tt . 

On the other hand, let {Yk\k>o be an arbitrary process. Suppose it is modeled 
(incorrectly!) by a autoregressive process observed in additive noise. We will show 
that under different assumptions on the distribution of the state and the observation 
noise, the conditional chain (given the observations Ys which are not necessarily 
generated by the model) forgets its initial condition geometrically fast. 

The proofs presented in this paper are based on generalization of the notion of 
small sets and coupling of the two (non-homogenous) Markov chains sampled from 
the distribution of Xo:„ given yb:„. The coupling argument is based on presenting 
two chains {Xfc} and {-^i"^}, which marginally follow the same sequence of transition 
kernels, but have different initial distributions of the starting state. The chains move 
independently, until they coupled at a random time T, and from that time on they 
remain equal. 

Roughly speaking, the two copies of the chain may couple at a time k if they 
stand close one to the other. Formally, we mean by that, that the the pair of states 
of the two chains at time k belong to some set, which may depend of the current, but 
also past and future observations. The novelty of the current paper is by considering 
sets which are in fact genuinely defined by the pair of states. For example, the set 
can be defined as {{x,x') : \\x — x'\\ < c}. That is, close in the usual sense of the 
word. 

The prototypical example we use is the non-linear state space model: 

X, = a(X,„i) + 

Y, = b{Xi)^V^, ^ ' 

where {Uk\k>o is the state noise and {Vk\k>o is the measurement noise. Both 
{C^fe}fc>o a-nd {Vfe}fc>o are assumed to be i.i.d. and mutually independent. Of course, 
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the filtering problem for the linear version of this model with independent Gaussian 
noise is solved explicitly by the Kalman filter. But this is one of the few non- 
trivial models which admits a simple solution. Under the Gaussian linear model, we 
argue that whatever are lo:ri, two independent chains drawn from the conditional 
distribution will be remain close to each other even if the Ys are drifting away. 
Any time they will be close, they will be able to couple, and this will happen quite 
frequently. 

Our approach for proving that a chain forgets its initial conditions can be decom- 
posed in two stages. We first argue that there are coupling sets (which may depend 
on the observations, and may also vary according to the iteration index) where we 
can couple two copies of the chains, drawn independently from the conditional dis- 
tribution given the observations and started from two different initial conditions, 
with a probability which is an explicit function of the observations. We then argue 
that a pair of chains are likely to drift frequently towards these coupling sets. 

The first group of results identify situations in which the coupling set is given in 
a product form, and in particular in situations where X x X is a coupling set. In the 
typical situation, many values of Yi entail that Xi is in some set with high proba- 
bility, and hence the two conditionally independent copies are likely to be in this 
set and close to each other. In particular, this enables us to prove the convergence 
of (nonlinear) state space processes with bounded noise and, more generally, in sit- 
uations where the tails of the observations error is thinner than those of dynamics 
innovations. 

The second argument generalizes the standard drift condition to the coupling set. 
The general argument specialized to the linear-Gaussian state model is surprisingly 
simple. We generalize this argument to the linear model where both the dynamics 
innovations and the measurement errors have strongly unimodal density. 

2. Notations and definitions 

Let n be a given positive index and consider the finite-dimensional distributions 
of {Xk}k>o given Yg-.n- It is well known (see (0, Chapter 3)) that, for any positive 
index k, the distribution of Xk given Xg-k-i and Yq-„ reduces to that of Xk given 
Xk-i only and Yo.n- The following definitions will be instrumental in decomposing 
the joint posterior distributions. 

Definition 1 (Backward functions). For k G {0, . . . ,n}, the backward function /3fc[„ 
is the non-negative measurable function on Y"^'^ x X defined by 

Pk\n{yk+l:n,x) = 

/n 
Q{x,dxk+i)g{xk+i,yk+i) Q{xi-i,dxi)g{xi,yi) , (5) 

l=k+2 

for k < n ~ 1 (with the same convention that the rightmost product is empty for 
k ^ n — I); /3n|n(') is set to the constant function equal to 1 on X. 

The term "backward variables" is part of the HMM credo and dates back to the 
seminal work of Baum and his colleagues (0, P- 168). The backward functions may 
be obtained, for all a; e X by the recursion 

Pk\n{x) = j Q{x, dx')g[x' , yk+i)Pk+i\n{x') (6) 
operating on decreasing indices k = n — 1 down to from the initial condition 

Pn\n{x) = 1. (7) 
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Definition 2 (Forward Smoothing Kernels). Given n > 0, define for indices k G 
{0, . . . , n — 1} the transition kernels 

' 1 otherwise , 

for any point x £X and set A £ B(X). For indices k > n, simply set 

Ffc|„ Q , (9) 

where Q is the transition kernel of the unobservable chain {Xk}k>a- 

Note that for indices fc < n — 1, F^,|„ depends on the future observations Yk+i-.n 
through the backward variables /3j,|„ and Pk+i\n only. The subscript n in the Fj,|„ 
notation is meant to underline the fact that, like the backward functions f3k\m the 
forward smoothing kernels Fk\n depend on the final index n where the observation 
sequence ends. Thus, for any .t G X, A i— > Fk\nix, A) is a probability measure on 
B{X). Because the functions x ^-> (3k\n{x) are measurable on (X, ;B(X)), for any set 
A £ 13{X), X 1-^ Fk\n{x,A) is ;B(X)/i3(R)-measurable. Therefore, Fk\n is indeed a 
Markov transition kernel on (X, B{X)). 

Given n, for any index fc > and function / G J-h (X), 

^dfi^k+l) I Xo;k,YQ;n] = F^,|„(Xfc,/) . 

More generally. For any integers n and m, function / G J-h (X™"*"^) and initial 
probability ^ on (X,S(X)), 

/» m 
••■ / fixO:ni) (t)i,0\nidxo)Y['Pi-l\n{Xi-l,dx,) , (10) 
i=l 

where {Fj.|„}fc>o are defined by ([8]) and ([9]) and is the marginal smoothing 

distribution of the state Xk given the observations lom- Note that 4>^^k\n m^'Y 
expressed, for any A G B{X), as 



05,fc|n(^) 



n -1 



0C,fe(rfa;)/9fc|„(x) 



05,fc(da;)/3fc|„(x) , (11) 



where is the filtering distribution defined in ([T]) and /3fc|„ is the backward 
function. 



3. The coupling construction and coupling sets 

3.1. Coupling constant and the coupling construction 

As outlined in the introduction, our proofs are based on coupling two copies of the 
conditional chain started from two different initial conditions. For any two prob- 
ability measures fii and ^2 we define the total variation distance jj/^i — A'2||tv ~ 
sup^ \pLi{A)— n,2{A) \ and we also recall the identities sup|y|<]^ |m(/)| = 2 — M2||tv 
and supq< |a*(/)I = IImi ^ ||tv ^^"^ ^e integers, and let fc G {0, . . . , n— 
m}. Define the m-skeleton of the forward smoothing kernel as follows: 

Fk,ra\n F kra+m — \\n • • ■ F km\n : (^2) 
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Definition 3 (Coupling constant of a set). Let n and m be integers, and let k G 
{0, . . . , 71 — m}. The coupling constant of the set C C X x X is defined as 

dcf 1 

£k,m\n{C) = I - - sup ||Ffc_,„|„(a;, •) - Ffe.„j|„(a;', ■)|| (13) 

The definition of the coupling constant implies that, for any {x,x') G C, 

Fk,m\n{x, A) A Ffe,„|„(x', A) > ek,m\n{C)iyZ,^\^{x, x'; A) . (14) 

where 

C , .^ _ (Ffc,r.|»(x, ■) A Ffc,^|„(x^ ■)){A) 

'=■"1"^ ' (Ffe.Hn(:^,-)AF,,„|„(x',.))(X)' ^'^^ 

where for any measures /Lt and on (X, ;B(X)), /i A is the largest measure for which 
{H A v){A) > min(^(A), i/(yl)), for all A £ fi(X). 

We may now proceed to the coupling construction. Let n be an integer, and for 
any fc g {0, . . . , [n/mj}, let Ck\n be a set-valued function, Ck\n ■ Y" — *■ B{X)®B{X), 
where B{X) (g) B{X) is the smallest cr-algebra containing the sets Ax B with A, B ^ 
B{X). We define Rfc|„ as the Markov transition kernel satisfying, for all {x, x') S 
and for all A G S(X) and (a:,x') G C/jin, 

R'fe,m|n(2;j a;'; A X ^4') = I (1 — efc^mln) ^{Pk,m\n{^i A) — £ k,m\n'^k,m\n{^ : ^' t Aj)^ 
X {(1 - £k ,m\n) {Pk,m\n{^ :A)— £k,m\7i^k,m\n 

{x,x';A'))} , (16) 

where we have omitted the dependence upon the set Cfc|„ in the definition of the 
coupling constant £k,m\n and of the minorizing probability Vk,n\m- For all (x,x') ^ 
X X X, we define 

Ffc,m|n(*^; i ') ^k.vi\n ® ^k,m\n{'^i 5 ') ' (-^'^) 

where, for two kernels K and L on X, K ® L is the tensor product of the kernels K 
and L, i.e., for aU (x, x') G X x X and A, A' G S(X) 

-ft: ® i(a;, a;'; A x A') = i^(a;, A)L{x',A') . (18) 

Define the product space Z^^XxXxjO, 1}, and the associated product sigma- 

algebra B{Z). Define on the space (2^,^3(7)®"^) a Markov chain Z, {X„X[,di), 
i G {0, . . . , n} as follows. If di = 1, then draw X^+i F^ „|„(Xi, •), and set ^-^i = 
Xi+i and (ii+i = 1. Otherwise, if {Xi,X'^ G C^in, flip a coin with probability of 
heads ei^m[„. If the coin comes up head, then draw Xi+i from Vi „^\^{Xi, X[\ •), and 
set X[_^i ~ Xi^i and di^i = 1. If the coin comes up tail, then draw [Xi^i, X[_f_i) 
from the residual kernel „j|„(Xi, X^'; •) and set d^+i = 0. If {Xi,X'j) ^ Qi^, then 
draw {Xij^i,X'-j^^) according to the kernel Fj ,„|„(Xi, X^'; •) and set d^+i = 0. For 
a probability measure on B{2.), denote the probability measure induced by 
the Markov chain Z.;, i G {0,...,n} with initial distribution /i. It is then easily 
checked that for any i G {0, . . . , \ n/m\ } and any initial distributions ^ and and 
any G fi(X), 

^m'®So {z.eAxXx {0, 1}) = <^4,«,„|„(A) , 

P[®«'®5o e X X A' X {0, 1}) = C^^,^,m\n{A) , 

where is the Dirac measure and ® is the tensor product of measures and 4>^^k\n 
is the marginal posterior distribution given by pT|) 
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Note that dt is the bell variable, which shall indicate whether the chains have 
coupled (di = 1) or not {di = 0) by time i. Define the coupling time 

T = M{k>l,d^ = l} , (19) 

with the convention inf = oo. By the Lindvall inequality, the total variation dis- 
tance between the filtering distribution associated to two difi'erent initial distribu- 
tion ^ and i.e., (X„ G • | Yo-n) and (X„ e • | Fom), is bounded by the tail 
distribution of the coupling time, 

||Pe (x„ e • I Yo-.n) -Pe(Xne-\ ro:„)llTv < ^Im'^s^iT > [n/m\) . (20) 

In the following section, we consider several conditions allowing to bound the tail 
distribution of the coupling time. 

3.2. Coupling sets 

Of course, the construction above is of interest only if we may find set-valued func- 
tion Cif.\n such whose coupling constant £k,m\n{(^k\n) is non-zero 'most of the time'. 
Recall that this quantity are typically functions of the whole trajectory yo:n- It is 
not always easy to find such sets because the definition of the coupling constant 
involves the product Ffc|„ forward smoothing kernels, which is not easy to handle. 
In some situations (but not always), it is possible to identify appropriate sets from 
the properties of the unconditional transition kernel Q. 

Definition 4 (Strong small set). A set C G B{X) is a strong small set for the 
transition kernel Q, if there exists a measure vc constants a-{C) > and 
cr+(C) < oo such that, for all x G C and A G B{X), 

<j^{C),^c{A) < Q{x,A) < <j+{Cyc{A) . (21) 

The following Lemma helps to characterize appropriate sets where coupling may 
occur with a positive probability from products of strong small sets. 

Proposition 1. Assume that C is a strong small set. Then, for any n and any 
k G {0, . . . , n}, C X C is a coupling set for the forward smoothing kernels Ffe|„ ; more 
precisely, there exists a probability distribution Vk\n such that, for any A G B(X), 

inf Ffcj„(.T,A) > —±-Lvk\n{A) 

Proof. The proof is postponed to the appendix. □ 

Assume that X = R*^, and that the kernel satisfies the pseudo-mixing condition 
([3]). Let C be a compact set C with diameter d = diam(C) large enough so that 
([3]) is satisfied. Then, for any n and any A; G {0, . . . , n}, C* = C x C is a coupling 
set for Ffe|„, and e{C) may be chosen to be equal to e-{d) / e+{d). (fl^ gives non- 
trivial examples of pseudo-mixing Markov chains which are not uniformly ergodic. 
Nevertheless, though the existence of small sets is automatically guaranteed for 
phi-irreducible Markov chains, the conditions imposed for the existence of a strong 
small set are much more stringent. As shown below, it is sometimes worthwhile to 
consider coupling set which are much larger than products of strong small sets. 
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The easiest situation is when the coupling constant of the whole state space efc.m|n(Xx 
X) is away from zero for sufRciently many trajectories yo-.n', for unconditional Markov 
chains, this property occurs when the chain in uniformly ergodic {i.e., satisfies the 
Doeblin condition). This is still the case here, through now the constants may de- 
pend on the observations Y^s. As stressed in the discussion, perhaps surprisingly, 
we will find non trivial examples where the coupling constant efc.m|n(X x X) is 
bounded away from zero for all Uo-.m whereas the underlying unconditional Markov 
chain is not uniformly geometrically ergodic. We state without proof the following 
elementary result. 

Theorem 2. Let n be an integer and m > 1. Then, 

[n/ei 

\\4'i,n - 4'e,n\\^V ^ n {l ~ £fc,m|n(X X X)} . 

k=0 

Remark 1. Consider the case where the kernel is uniformly ergodic, i.e., 

CT_ inf q{x, x') > and (t+ sup q{x, x') < oo . 

{x.x')exxx {x,x')exxx 

One may thus take m = 1 and, using Proposition[T]£fc x|n(X x X) cr_/<T+. In such 
a case, ||0^^„ - ^C'^nllxv ^ (1 ^ '^-/<^+)"- 

To go beyond this example, we have to find verifiable conditions upon which we 
may ascertain that X x X is an m-coupling set. 

Definition 5 (Uniform accessibility). Let k,£,n be integers satisfying £ > 1 and 
k £ {0, . . . , n — £}. A set C is uniformly accessible for the forward smoothing kernels 
Ffc,^|n if there exists a constant iik,t{C) > satisfying, 

inf Ffc,,|„(a;,C) > Kfc.,(C) . (22) 

The next step is to find conditions upon which a set is uniformly accessible. For 
any set A G S(X), define the function a : ^ [0, 1] 



( /l^def . t W[yi.j]{xo,xi\A) i 

a{yi:t,A) = mf — J — = {I + a[yi.j- A)) 

- i.xf+iexxx W[yi:i\[xo,xi+i]A) 



XOr 

where we have set 

dof 



(23) 



W[yi.,t\{xQ,xi+i;A) 

■■■ Y].^^^^-^'^'^^9{xi,yi)q{xe,X£+i)lA{xi)^j,{dxi:i) . (24) 

and 

~, xNdof W[yi.,i-i\{xo,Xi;A'^) 

a{yi.j-i;A) = sup — . (25) 

xo,xi,eXxx W[yi:i-i\{xo,xi;A) 

Of course, the situations of interest are when a(yi:£_i; A) is positive or, equivalently, 
a{yi:i^i; A) < oo. In such case, we may prove the following uniform accessibility 
condition: 

Proposition 3. For any integer n and any k G {0, . . . ,n — £}, 

inf Ffc,,|„(x, C) > aiYk+i:k+i; C) . (26) 
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// in addition C is a strong small set for Q, then X x X is a {£ + l)-coupling set, 

<7 (C) 

inf Fk+e+i\n ■ ■ ■ Fk\n{x, A) > ——a{Yk+i:k+e; C) . (27) 
The proof is given in Section [G] 

4-1. Examples 

4-.1.1. Bounded noise 

Assume that a Markov chain {Xk}k>o in X = is observed in a bounded noise. 
The case of bounded error is of course particular, because the observations of the 
y's allow to locate the corresponding X's within a set. More precisely, we assume 
that {Xk}k>o is a Markov chain with transition kernel Q having density q with 
respect to the Lebesgue measure and Yfe = b{Xk) + Vk where, 

• {Vk} is an i.i.d.. independent of {A"fe}, with density py. In addition, pv{\x\) = 
for |a;| > M. 

• the transition density {x,x') t— > q{x,x') is strictly positive and continuous. 

• The level sets of &, {x £ X : \b{x)\ < K} are compact. 

This case has already been considered by (0), using projective Hilbert metrics 
techniques. We will compute an explicit lower bound for the coupling constant 
£fc,2tn(X X X), and will then prove, under mild additional assumptions on the dis- 
tribution of the y's that the chain forgets its initial conditions geometrically fast. 

For J/ G Y, denote C{y) =^ {x G X, \b{x)\ < \y\ + M}. Note that, for any x gX and 
A e 6(X), 

Ffc+i|„Ffc|„(a;,^) = 

// q{x,Xk+l)g{xk+l,Yk+l)q{xk+l,Xk+2)gixk+2,Yk+2)lA{Xk+2)l3k+2\n{Xk+2)dXk+2 
JJ q{x,Xk+l)g{xk+l,Yk+l)q{xk+l,Xk+2)g{xk+2,yk+2)!3k+2\n{xk+2)Axk+2 

Since q is continuous and positive, for any compact sets C and C", infcxc lixi x') > 
and suppxp, q{x,x') < oo. On the other hand, because the observation noise is 
bounded, g{x,y) — g{x,y)lc{y){x). Therefore, 

{X,A) > p{Yk+l,Yk+2H\n{A) , 

where 

infc(a)xc(j/') qix,x') 



SUPcfe)xC(y')5(^'^') ' 

and 

dof / 9{xk+2 , Yk+2 ) 1a {xk+2 )Pk+2\n )j^(d2:fc+2 ) 



J 9{xk+2,Yk+2)f3k+2\nixk+2)'^idXk+2) 

By applying Theorem [51 we obtain that 

L"/2J 

\\k,n-(bi',n\\ry < [] { 1 " p(>2fe , 1^2^+1 ) } • 
fe=0 

Hence, the Markov chain is geometrically ergodic if 

L«/2J 

liminf n"^ ^(iljfe, ^2fc+i) > i o-S- ■ 

n — yoo ^ — ^ 

This property holds under many different assumptions on the observations Yoiti and 
in particular, if the observations follow a model which 'approximately equal' to the 
assumed one. 
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4-. 1.2. Functional autoregressive in noise 

It is also of interest to consider cases where both the X's and the F's are un- 
bounded. We consider a non-hnear non-Gaussian state space model (borrowed from 
(|l2l . Example 5.8)). We assume that Xq ^ and for fc > 1, 

Xk = a{Xk-i) + Uk , 
Yk = b{Xk) + Vk , 

where {Uk} and {V^} are two independent sequences of random variables, with 
probability densities pu and py with respect to the Lebesgue measure on X = M**^ 
and Y = M.'^^ , respectively. In addition, we assume that 

• For any a; G X = M'^^ , pu{x) = pu{\x\) where pu is a bounded, bounded 
away from zero on [0, Af], is non increasing on [M, oo[, and for some positive 
constant 7, 

— r\ Tm > 7 > . (28) 

• the function a is Lipshitz, i.e., there exists a positive constant a+ such that 
\a{x) — a{x')\ < a^\x — x'\, for any x, x' G X, 

• the function b is one-to-one differentiablc and its Jacobian is bounded and 
bounded away from zero. 

• For any y G Y = R'^'*', pv{y) — Pv{\y\) where pv is a bounded positive lower 
semi-continuous function, pv is non increasing on [M, cx3[, and satisfies 

/>oo 

T / [pu{x)]-^pv{b-x)[pu{a+x)]-^dx < 00 , (29) 
Jo 

where b- is the lower bound for the Jacobian of the function b. 

The condition on the state noise {Uk} is satisfied by Pareto-type, exponential and 
logistic densities but obviously not by Gaussian density, because the tails are in 
such case too light. 

The fact that the tails of the state noise U are heavier than the tails of the 
observation noise V (see (|29|)) plays a key role in the derivations that follow. In 
Section [5] we consider a case where this restriction is not needed (e.g., normal). 

The following technical lemma (whose proof is postponed to section ^ , shows 
that any set with finite diameter is a strong small set. 

Lemma 4. Assume that diam(C) < 00. Then, for all xq G C and xi G X, 

eiC)hcixi) < q{xo,xi) < e-\C)hcixi) , (30) 

with 

£(C) =^ 7P(7(diam(C)) A inf Pui^) A I sup Puiu)] , (31) 

u<dmm(C)+M y«<diam(C)+M / 

hcixi) lidixi,a{C)) < M) + l(d(xi,a(C)) > M)pu{\xi ~ a{zo)\) , (32) 

where 7 is defined in (j28p and zq is an arbitrary element of C . In addition, for all 
Xq G X and Xi G C, 

v{C)kc{xo) < q{xo,xi) , (33) 

with 

i^{C)=' inf PU , (34) 

|«|<diam(C) + M 

kcixo) =' l{d{a{xo),C) < M) + l(d(a(a;o),C) > M)pu{\zi ~ a{xo)\) , (35) 
where zi is an arbitrary point in C . 
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By Lemma m the denominator of ([25]) is lower bounded by 

W[y]{xo,X2-.C) > e(C) i^{C)kc{xo)hc{x2) [ g{x^,y)dx^ ■ (36) 

Jc 

Therefore, we may bound C), defined in (|25p . by 

5(2/1, C) < (^e(C) lyiC) J^gixi,yi)dx,^ 

X sup [kcixo)]-^[hcix2)]-'W[yi]ixo,x2;C^) . (37) 

In the sequel, we choose C = Cxiy) '= {x,\x — b^^{y)\ < K}, where iiT is a 
constant which will be chosen later. Since, by construction, the diameter of the 
set Ciiiy) is 2if uniformly with respect to y, the constants e{CK{y)) (defined in 
([3T|) ) and v{CK{y)) (defined in ([34]) ) are functions of K only and are therefore 
uniformly bounded from below with respect to y. We will first show that, for K 
large enough, ^^^^^^^g{xi,y)dxi is uniformly bounded from below, as shown in 
the following Lemma (whose proved is postponed to Section [7|). The following two 
Lemmas bound the terms appearing in the RHS of (j37p . 

Lemma 5. 

lim inf / Pv{\y — h{x)\)dx > . 

We set zq — b^^{y) in the definition ([32|l ofhc{y) and zi — b^^{y) in the definition 
(|35l) . Wc denote 

lK{xo,X2;y) =^ [kcK{y)ixo)]~^[hcK{y)i^2)]~^ 

X / Pu i\xi - a{xQ)\)pvi\y - b{xi)\)pu{\x2 - a{xi)\)dxi . (38) 

The following Lemma shows that K may be chosen large enough so that Ik{xo, X2, y) 
is uniformly bounded over xo,X2 and y. 

Lemma 6. 

lim sup sup sup lK{xo,X2',y) < oo . (39) 
The proof is postponed to Section [7l 



5. Pairwise drift conditions 

5.1. The pair-wise drift condition 

In the situations where coupling over the whole state-space leads to trivial result, 
one may still use the coupling argument, but this time over smaller sets. In such 
cases, however, we need a device to control the return time of the joint chain to the 
set where the two chains are allowed to couple. In this section wc obtain results that 
are general enough to include the autoregression model with Gaussian innovations 
and Gaussian measurement error. Drift conditions are used to obtain bounds on the 
coupling time. Consider the following drift condition. 

Definition 6 (Pair- wise drift conditions toward a set). Let n be an integer and 
k G {0, . . . , 71 — 1} and let Cf;\n be a set valued function C^^i ■ Y"+^ — > B{X) x B{X). 
We say that the forward smoothing kernel Fj.|„ satisfies the pair-wise drift condition 
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toward the set Ck\„ */ there exist functions Vfc|„ : X x X x Y"+i M, V/;|„ > 1, 
functions Aj,|„ : Y"+i — > [0,1), : Y"+^ sitc/i that, for any sequence 

yo:n e Y", 

%\nVk+l\n{x,x') < Pk\n {x,x')€Ck\n (40) 

(x,:r') {x,x')^Ckin- (41) 

where Rfc|„ is defined in (|16[) and Fj.|„ zs defined in (|17p . 

We set efc|„ = efe|„((7fc|Ti), the coupling constant of the set and we denote 

■Bfe]„ '^^^ 1 V pfc|„(l - efe|,i)Afc|n ■ (42) 

For any vector {ai.„}i<i<„, denotes by [J, a](j „-) the i-th largest order statistics, 
i-e-, [i a](i,„) > [i a](2,n) >■•■>[! a](«,,i) and [t a](j.„) the i-th smallest order 
statistics, i.e., [t a](i^„) < [t a](2,„) <•■•<[? «](„,„)• 

Theorem 7. Let rt 6e an integer. Assume that for each k G {0, . . . ,n — 1}, there 
exists a set-valued function Ck\n '■ Y"^^ — > i3(X) g) S(X) smc/i f/iaf the forward 
smoothing kernel Fj.|„ satisfies the pairwise drift condition toward the set Ck\n- 
Then, for any probability on {X,B{X)), 



7n n m 

dcf ■ 



ll</'4.« - '/'{'.nllTV - "li'^ ^™." (43) 
l<rn<n 

where 

A™,„ -'11(1 - £]w«)) + n^»i"n[^ B](^in)e®e'(^o) (44) 

The proof is in section [^7T] 

Corollary 8. // there exists a sequence \m(n)\ of integers satisfying, m(n) < n 
for any integer n, Ym\n^ oo 'ni{n) = oo, and, P^-a.s. 

(rn(n) n m{n) \ 

i=0 i=0 1=0 y 

limsup ||(^5,„ - (^5',„||^y , -a.s. . 

n 

Corollary 9. // there exists a sequence {m{n)} of integers such that m{n) < n for 
any integer n, liminf min)/n — a > Q and P^-a.s. 



m(n) -1 ?i -1 n—m{n) 



I 1 ^ ' 2 1 

limsup - ^ log(l - [T £](i|„)) + - ^logAi|„ + - ^ log[i B(,\. 

\ 1=0 i=l i=l 

i/ien t/iere exists v e (0, 1) such that 

\\(f>i,n - <l>i',n\\Ty , P^-a.S. . 

5. 2. Examples 

5.2.1. Gaussian autoregression 
Let 

Xi = aXj-i + aUi 



< -A 
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where \a\ < 1 and {Ui}i>o and {Vi} are i.i.d. standard Gaussian and arc indepen- 
dent from Xq. Let n be an integer and fcG{0,...,?i — 1}. The backward functions 
are given by 

/3fe|„(a;) oc exp - 77ife|„)^/(2pfc|„)) , (45) 

where mfe|„ and pk\n can be computed for k = {0, . . . ,n — 2} using the following 
backward recursions (see ([6])) 

_ Pfc+i|„yfc+i +«^'"^fc+i|n 2 _ (r^+g^K+ii^ + a'crV' 

" 72 , Q,2--2 ' Pk\n - -2 , ^2_2 ' 

initialized with = i^n and Pn-i\n = cr^ + t^. The conditional transition 

kernel Fi|„(a;,-) has a density with respect to to the Lebesgue measure given by 
0('i Mi|n(2^)7 7j^„)i where (/)(z; ct^) is the density of a Gaussian random variable 
with mean /i and variance cr^ and 



MilnV'^J f 9 , 9\ •? i992 



(T2+a2)p2^^l^^+a2^2^2 



From ([46]), it follows that for any i e {0, . . . , n - 1}, cr^ < p2^ < a"^ + r^. This 
implies that, for any {x, x') e X x X, and any i S {0, . . . , n — 1}, the function /ij|„ is 
Lipshitz and with Lipshitz constant which is uniformly bounded by some (3 < |q;|, 

\tJ'i\n{x) - Pi\n{x')\< P\X - x'\ , /? =^ 2 T ^2^^2t'^2^ 2 2 ' (^7) 

and that the variance is uniformly bounded 

2 dcf o-^r^ ^ 2 2 dcf g2-r2(q-2 + r^) 

" (1 + ^2)^2 + cr2 - "^^l" - ^+ ~ (r2 + cr2)2 + ^2x2(72 ' ^ ^ 

Therefore, for any c < oo, all sets of the form 

C =^ {(x,x') e X X X : |a; - x'l < c} , (49) 
are coupling sets. Note indeed that, for any i G {0, . . . , ?i — 1}, 

^ ||Fj|„(a;, •) - Fj|„(a;', ■)\\r^^ = 2erf (^7,;^^^|jUj|„(a:;) - //j|„(x')|) < 2erf(7lVc) , 

where erf is the error function. More precisely, for any (x, x') G C and any integer 
n and any i £ {0, . . . , n — 1}, 

F,|„(x,^) AF,|„(a;',^) > £v,^i\^{x,x' ; A) , where e '^^ (l - 2erf (71^/30)) , (50) 

and Vis,-!! is defined as in psp . For c large enough, the drift condition is satisfied 
with V(x,x') ^ 1 + {x-x')'^: 

F,i„Vix,x') = 1 + {p,\n{x) - fi,\n{x')y + 27,f|„ < 1 + /3%T - + 7^ . 
The condition (gD]) with 

P^n<P=il-e)-'{l+P'c^+^l) , (51) 
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where c is the width of the couphng set in (|49)) . The condition ([4T|) is satisfied with 
Aj|„ = for any /? and c satisfying /3 < /3 < f and > (f - /J^ + 7+)/(/3^ - 
/3^). it is worthwhile to note that aU these bounds are uniform with respect to 
2 G {0, . . . J Ti — 1} and reahzation of the observations yo-.n- 

Therefore, for any 
m € {0, . . . , n}, we may take upper bound Am.n (defined in (|44p ') by 

A^.n < (1 - e)"' + B'"/32"(l + 2 j ^idx)x^ + 2 J e{dx)x^) 

with B = 1 V p(l — e)/3^, where e is defined in (|5D|) . p is defined in ([?T|) . Taking 
TO = for some 6 > such that i?*/3^ < 1, this upper bound may be shown to 
go to zero exponentiaUy fast and uniformly with respect to the observations yo-.n- 

5.2.2. State space models with strongly modal distributions 

The Gaussian example can be generalized to the more general case where the distri- 
bution of the state noise and the measurement noise are strongly unimodal. Recall 
that a density is strongly modal if the log of its density is concave. 

First note that if / and g are two strongly unimodal density, then the density 
h = fg/ J fg is also strongly unimodal, with mode that lies between the two modes; 
its second-order derivative of \ogh is smaller that the sum of the second-order 
derivative of log / and \ogg. Let the state noise density be denoted hy pu{-) = e'^^'^ 
and that of the measurements' errors be py(-) = e'^'-'^ Define by the recursion 
operating on the decreasing indices 

f}i\n{x)=Pv{y^-x) j g(x,Xi+i)/?i+i|„(a;j+i)da;,+i , (52) 

with the initial condition /3„|„(x) = pv{yn — x). These functions are the conditional 
distribution of the observations Yi-n given Xi = x. They are related to the backward 

function through the relation /3j|„(a;) =^ Pi\n{x)Pv{yi — x). We denote il)i\n{x) '= 
log/3,|„(x). Now, 

'il^i\n{x) = i}{Yi -x)+ log j pu{z - ax)i3i+i\n{z)Az . 

Under the stated assumptions, the forward smoothing kernel Fi|„ has a density with 
respect to the Lcbesgue measure which is given by 

fi|n (^2 1 ^2+1 ) — 

pu{xr+i ~ axi)P^+i\„{xi+i)/ J pu{z - axi)(3i+iin{z)dz . (53) 

Denote by Coyi\n,x the covariance function with respect to the forward smoothing 
kernel density. We recall that for any probability distribution P on (X, S(X)) and 
any two increasing measurable functions / and g which arc square integrable with 
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respect to P, the covariance of / and g with respect to P, is non-negative. Hence, 

i^i\n"{x) 

/^[/(z - aa;)/3i+i|„(z)dz \ \ -pu^z - ax)]5ij^Y\n{.z) dzl 



J pu{z - ax)(ii+i\n{z)dz 
jp'jj(z - aa^)^,+i|„(z) dz^Jpuiz- ax)0l^^^Jz) dz 



^ / Puiz - ax)(3i+iin{z)dzJ V / pu{z ~ aa:)/5,;+i|„(z) dz/ 

(54) 

where we used a direct differentiation, integration by parts, and the fact that both 
<f)' and V'i+iin Eire monotone non-increasing functions (the last statement follows by 
applying (|54p inductively from n backward). 

We conclude that ipi\n is strongly unimodal with curvature at least as that of the 
original likelihood function. Hence the curvature of the logarithm of the forward 
smoothing density is smaller than the sum of the curvature of the state and of the 
measurement noise, 

\^ogh\n{xi-,x^+i)\" < tp"{x,+i - ax,) + i'"{Y,+i -Xi+i) < -c , (55) 

where 

c = — max(y9"(a:i_|-i) -I- max'(/'"(a;i+i) . (56) 
Lemma lTUl shows that the variance oi Xi^i given Xi and Yi^i-n is uniformly bounded 

Vi\n{x)'^= J (^Xi+i- J x.i+ifi\n{x,Xi+i)dxi+i^ fi|„ (x, 2;^+! )da 



ix,+i < c ^ 



where c is defined in (|56|) . Now let 

ei\n{x)'^= j Xi+iii\n{x,Xi+i)<ix,+i . 

Similarly as above 

—^{x) = -a Covi|„^^ (Z, ip'{Z - ax)) . 

Note that Xi+i i-^ ei\n{x) - Xi+i, Xi+i ^ ip'{xi+i - ax), and Xi+i i-^ V'i+i|„(a^i+i) 
are monotone non-increasing and therefore their correlation is positive with respect 
to any probability measure. Hence 

dei|„ 

(x) 



dx 

J (e,|„(a;) - Xi+i) ^'{x^+i - aa;)e'^(-^+i-"-)+^-+ii"(-'+i)dx,+i 



a 



< \a\ 



J QVixi+l-ax)+tpi+n„(xi+i) ^^^^^ 

J (e,|„(x) ((^'(a;,+i - ax) +V:+i|„(a:i+i)) e'^(--+i-"-)+'^'+ii"(-'+i)dxi+i 
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by integration by parts. Put as before V{x,x') = 1 + {x — x')^. It follows from the 
discussion above that 

Fi\nV{x, x') = 1 + (e,|„(x) - e,in{x'yf + Wj|„(a;) + v^„{x') , 

where Uj|„(a;) and v^^i^') arc uniformly bounded with respect to x and x' and 
~ Si\7ii^')\ ^ ci\x — x'\. The rest of the argument is like that for the normal- 
normal case. 

We conclude the argument by stating and proving a lemma which was used above. 

Lemma 10. Suppose that Z is a random variable with probability density function f 
satisfying sup^(9^/9x^) log/ < —c. Then, Z is square integrable andYai{Z) < . 

Proof. Suppose, w.l.o.g., that the maximum of / is at 0. Under the stated assump- 
tion, there exist constants a > and b such that f{x) < ae~'^^^~''^ . This implies 
that Z is quare integrable. Denote z i-^ C{z) = log f{z) + cz'^ / 2 which by assumption 
is a concave function. Let m be the mean of Z. 



\z - m) {cz - £,'{z)) e«(^)"^^'/2dz + c'^ I {z - m)C{z)c^^'^~'''''^^dz. 



By construction, z i~> ^'{z) is a non-increasing function. Since the inequality Cov((^(Z), tpi^)) ^ 
holds for any two non-decreasing function ip and which have finite second 
moment, the second term in the RHS of the previous equation is negative. Since 
{cz — C'('Z)) e^'-^^"'^^ = —f'{z), the proof follows by integration by part: 

Var(Z) < -c~i J{z- m)f'{z)dz = c"^ J f{z)dz ^ . 

□ 

6. Proofs 



Proof of Proposition [H The proof is similar to the one done in (jllh . For x G C, the 
condition, (|2ip implies that 

a^{C)iyc{dx') < ^^^{dx') < a+{C)iyc{dx') . 
dvc 

Plugging the lower and upper bounds in the numerator and the denominator of ([5]) 
yields, 

^ , cr_ X4 ^%!^(rfa;fc+i)/3fc+i|„(a;fc+i)Ai(da;fc+i) 

l'j,|„(Xfc,^) > ^ 

k ^^'%^{dxk+i)fik+i\n{xk+i)n{dxk+i) 

The result is established with 

dcf h ^^^;^{dxk+l)(ik+l\n{xk+l)^J.{dxk+l) 



Ix dur (dXk+l )Pk+l\n{Xk+l )tl{dXk+l ) 



□ 



Proof of proposition For any Xi G X, 
F{X,+eeC\X,=x,,Y,.,^) 

_ J ■■■ J W[Yi+i:i+i]{xt, x^+t+u C)P^+(+i\nixt+i+i)fi{dxi+i,i+i+i) 
J ■■■ J W[Yi+i.,,+i] {xi , Xi+e+i ; X)Pi+i+i\n{xi+i>+i)^(dxt+i;t+i+i) 

J ■■■ J W[Yi+i;i+e]ixi,x^+e+i;X)P^+g^iin{xi+i+i)fj.{dxi+e+i) 
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where W is defined in ([24]) . The proof is concluded by noting that, under the stated 
assumptions, 

W[Y,+i.,i+i]{xi,x,+e+i;C) 
sup —T- J- — > a{Y,+i.,i+e;C) , 

□ 

6. 1 . Proof of Theorem 

Proof. For notational simphcity, we drop the dependence in the sample size n. 

Denote Nn =^ Ej=o ^Cj i^j^^'j) ^^"^ =^ e(C'i)- For any m G {1, . . . , n + 1}, we 
have: 

(T > n) 

< P|^e,o (T > n, Nn-i > m) + pI^, ,, (T > n, N^-i < m) . (57) 

The first term on the RHS of the previous equation is the probability that we fail 
to couple the chains after at least m independent trial, it is bounded by 

m 

{T > n, iY„_i > m) < n (1 - [T £](«)) • (58) 
1=1 

where [t e](i) are the smallest-order statistics of (ei, . . . , e„). We consider now the 

second term in the RHS of Set Bj ^= 1 Vp^ (1 -ej)Aj\ On the event {iV„_i < 
m - 1}, 

where [J, B] is the j-th largest order statistics of Bi , . . . , i?„ . Hence, 
which implies that: 

n mi 

P^^^o {T > n, iV„_i < m) < [] A, Ef^^,^,jAf„] (59) 

where, for G {0, . . . , n}: 

= f n n s;'"^^'"''''^^ Vu{X,^X',)l{d, = 0} . (60) 

Since, by construction, 

£^,^,,0 [Vfc+i(Xfc+i,X[.+i)lK+i = 0} I^Ffe] 

(1 - ek)RkVk{Xk,X'^^)lc. {Xk, Xl) + XkVk{Xk,X',)lc,{Xk,Xl) , 

it is easily shown that {Mk,k > 0) is a (JF, pI'^, Q)-supermartingale w.r.t. where 
^ (^fc)i<fe<n with for k>0,Tk =' <y [(X,, Xj, d,), < .7 < A:] . Therefore, 

E^4,,o(A^") < E|;^,,o(Mo) - e ® eiVo) . 
This establishes (|43p and concludes the proof. 

□ 
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7. Proofs of Section 14.1.21 

To simplify the notations, the dependence of C{y) in K is imphcit throughout the 
section. 

Proof of Lemma^ Consider first the case d{xi, a(C)) > M. For any zi G a(C), 

M < \xi — a(a:o)| < \xi — zi\ + \zi — a{xo)\ < diam(C) + |a;i — zi| , 
M < \xi — zi\ < \xi — a(xo)| + \zi — a{xo)\ < diam(C) + \xi — a{xo)\ . 

Using that pjj is non-increasing for u > M and (|28p . we obtain 

Pui\xi - a(xo)l) > pc/(diam(C) + \xi - zi\) > jpu{diam{C))pui\xi - zi\) , 

and similarly, 

pu{\xi - zi\)> 7p[/(diam(C))p(7(|a:i - a(a;o)|) , 

which establishes that (|30p holds when d{xo, a{C)) > M . 

Consider now the case d(xi,a(C)) < M. Since belongs to C, then \xi — 
0(2^0)1 < M + diam(C), which implies that 

inf Pu{u) <pu{\xi - a{xo)\) < sup Pu{u) , 

tt<M+diam(C) 'u< J\/+diam(C) 

(pll holds for d(xi,a(C)) < M. 

Consider now the second assertion. Assume first that xq is such that d{a{xo), C) > 
M and let zi be an arbitrary point of C. Then, for any xi G C, 

M < \xi — a(xo)| < |a;i — zi| + |zi — a(a;o)| < diam(C) + \zi — a{xo)\ . 

Using that pu is monotone decreasing on [M, 00) and (|28p. 

Pui\xi - a{xo)\) > pu{diamiC) + \zi - a{xo)\) 

> 7P(7[diam(C)]p(7(|zi - a{xo)\) . (61) 

If d{a{xo), C) < Af , then for any xi G C, \xi — a{xo)\ < diam(C) + M, so that 

inf Pu < Pu {\xi - a{xo)\) . (62) 

|M|<diam(C) + A/ 

□ 

Proof of Lemma\^ Choose K such that b^^K > M . If \b~^{y) — a;| > A', then, 

\y - h{x)\ = \b{b-\y)) - b{x)\ > b^'\b-\y) - x\ > M , (63) 
and since pv is non-increasing on the interval [M, 00 [, the following inequality holds 

pvi\y - b{x)\)dx < [ pvibT^\b-\y) - x\)dx 

x-b-^{y)\>K J\x-b-^{y)\>M 



/>oo 

<bi pvix)dx . 



Since the Jacobian of 6 is bounded, ^ pv{\y — b{x)\)dx is bounded away from zero 
by change of variables. The proof follows. 

□ 
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Proof of Lemma \B[ We will establish the results by considering independently the 
following cases: 

1. For any y and any (xq, X2) such that (i(a(a;o), C(y)) < M aiid d{x 2 ^ a[C{y)]) < 

I{xo,x2;y) < ^suppc/^ ■ 

2. For any y and any (xq, X2) such that (i(a(2;o), C(y)) > il/ and (i(x2, a[C(j/)]) < 
M, 

POO 

I{xo,X2]y) <l~^ (sup PC/) / [pu{x)]~^pv{h-x)dx . 

J K 

3. For any 7/ and any (xq, X2) such that (i(a(a;o), C(y)) < Af and (i(x2, a[C(y)]) > 
M 

/(xo,X2;2/) < 7"^ (suppc/) jfel^ + j pv{h-x)[pu{a+x)]~^Ax 

4. For any y and any (xq, X2) such that (i(a(xo), C(y)) > M and (i(x2, a[C(y)]) > 
M, 

/(xo,x2;y) < 7"^ 

^ Ik t^^'^^'']"^^^'^^"^^ + [Pc/(a+a;)]"^| dx . 

Proof of Assertion\^ On the set {xq, d(a(xo), C(2/)) < M}, kc(y){xQ) = 1; On the 
set {x2, c?(x2, a[C(y)]) < M}, hc{y){x2) = 1- Since pu is uniformly bounded, the 
bound follows from Lemma [H] and the choice of K. □ 

Proof of Assertion[B On the set {xq, d(a(xo), C(?/)) > M}, kc{xo) — Pu{\b^^{y) — 
a(xo)l) ; On the set {x2,d{x2,a[C{y)]) < M}, hc{x2) = 1- Therefore, for such 

{X0,X2), 

I{xo,X2;y) < {snppu) 

Pu^il^^^iy) ~ a{xa)\) pc/(|xi - a(xo)|)py(|2/i - &(xi)|)dxi . (64) 

We set a = Xi — a(xo) and /3 = b~^{y) — xi. Note that |q; + = \b~^{y) — a(xo)| > 
(i(a(xo), C{y)) > M. Since pu is non-increasing on [M, oo[, Pu{\cf + P\) > P(7(|«| + 
/3|), and the condition (|28p shows that (p[/(|a + /?!))""'" p[/(|«|) < l^^Pu^i\P\) which 
implies 

PiJ^{\b~^{y) - a{xo)\)pu{\xi ~ a(xo)l) < 7"V[/\l&~^(y) - a^il) • (65) 
Therefore, plugging (|65|) into the RHS of ([64|) yields 

/(xo,X2;2/) < 7"^ (suppc/) / pj}^{\b^^{y) ~ xi\)pvib-\b~^iy) ~ x\)dxi 

J\xi-b-^{y)\>K 
poo 

< 7"^ (suppc/) / pj}^(x)pv(6_x)dx . 

□ 
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Proof of Assertionl^^ On the set {xo,d{a{xo),C{y)) < M}, kc{xo) = 1; on the set 
{x2,d{x2,a[C{y)]) > M}, hc{x2) = Pu{\x2 - a.[h^'^{y)]\) = 1. Therefore, for such 
(2:0,2:2); 

I{xo,x2]y) < (sup PC/) 

X Pu^i\^2 ~ a[b^^{y)]\) pvi\y ~ b{xi)\)pu{\x2 - a{xi)\)dxi . (66) 

We set a = X2 — a{xi), (3 = a{xi) — a\f)~^{y)\. Since |a + Z?] > d(x2, a[C(y)]) > M, 
using as above that {pu{\ct + /3|))^^ P[/(|q;|) < 7~-^p^"^(|/3|), we show 

P^\\X2 - a[b-\y)]\)pu{\x2 - a{x,)\) < ^-'p-\\a{x,) ~ a[b-\y)]\) . (67) 

Since for any x,x' G X, 

Pu\\a{x) - a{x')\) < ( inf pu{u)] l{\a{x) - a{x')\ < M} 

\u<AI J 

+ p^\a+\x - a;'|)l{|a(.T) - a{x')\ > M} , (68) 
the RHS of ((66|) is therefore bounded by 

I {xo, X2] y) <l~^ (sup pu) / pv{b-i\xi -b^^{y)\)) 

J\xi^b-^y)\>K 

-1 



( inf »c/(w)) +Pu\a+\xi-b \y)\)\dxi 

\u<Al I I 



□ 



Proof of Assertion^ On the set {xq, (i(a(xo), C(i/)) > M}, kc(y)ixo) =pu{\b ^{y)- 
a(xo)l). On the set {x2, d{x2, a[C{y)]) > M}, kc(y){x2) = pu{\x2-a[b~^{y)]). There- 
fore, for such (xo,a;2), 

I{x(^,X2]y) <Pu^{\b~^{y) - a{xQ)\)p^^ {\x2 ~a[b^^{y)]\) 



X / Pu{\xi - a{xQ)\)pv{\y -b{xi)\)pu{\x2 - a{xi)\)dxi . (69) 

Jc-iy) 

Using ([55]) . (p7)) . and the RHS of the previous equation is bounded by 

I{xo,X2]y) <i'''^ p^\|x|)py(6_|a;|) I ^^i^f^pc/(w)^ + p^\a+x) | dx . 

The proof foUows. □ 

□ 
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